60 research outputs found
On Optimal Partitioning For Sparse Matrices In Variable Block Row Format
The Variable Block Row (VBR) format is an influential blocked sparse matrix
format designed to represent shared sparsity structure between adjacent rows
and columns. VBR consists of groups of adjacent rows and columns, storing the
resulting blocks that contain nonzeros in a dense format. This reduces the
memory footprint and enables optimizations such as register blocking and
instruction-level parallelism. Existing approaches use heuristics to determine
which rows and columns should be grouped together. We adapt and optimize a
dynamic programming algorithm for sequential hypergraph partitioning to produce
a linear time algorithm which can determine the optimal partition of rows under
an expressive cost model, assuming the column partition remains fixed.
Furthermore, we show that the problem of determining an optimal partition for
the rows and columns simultaneously is NP-Hard under a simple linear cost
model.
To evaluate our algorithm empirically against existing heuristics, we
introduce the 1D-VBR format, a specialization of VBR format where columns are
left ungrouped. We evaluate our algorithms on all 1626 real-valued matrices in
the SuiteSparse Matrix Collection. When asked to minimize an empirically
derived cost model for a sparse matrix-vector multiplication kernel, our
algorithm produced partitions whose 1D-VBR realizations achieve a speedup of at
least 1.18 over an unblocked kernel on 25% of the matrices, and a speedup of at
least 1.59 on 12.5% of the matrices. The 1D-VBR representation produced by our
algorithm had faster SpMVs than the 1D-VBR representations produced by any
existing heuristics on 87.8% of the test matrices
A Parallel Solver for Graph Laplacians
Problems from graph drawing, spectral clustering, network flow and graph
partitioning can all be expressed in terms of graph Laplacian matrices. There
are a variety of practical approaches to solving these problems in serial.
However, as problem sizes increase and single core speeds stagnate, parallelism
is essential to solve such problems quickly. We present an unsmoothed
aggregation multigrid method for solving graph Laplacians in a distributed
memory setting. We introduce new parallel aggregation and low degree
elimination algorithms targeted specifically at irregular degree graphs. These
algorithms are expressed in terms of sparse matrix-vector products using
generalized sum and product operations. This formulation is amenable to linear
algebra using arbitrary distributions and allows us to operate on a 2D sparse
matrix distribution, which is necessary for parallel scalability. Our solver
outperforms the natural parallel extension of the current state of the art in
an algorithmic comparison. We demonstrate scalability to 576 processes and
graphs with up to 1.7 billion edges.Comment: PASC '18, Code: https://github.com/ligmg/ligm
Jet: Multilevel Graph Partitioning on GPUs
The multilevel heuristic is the dominant strategy for high-quality sequential
and parallel graph partitioning. Partition refinement is a key step of
multilevel graph partitioning. In this work, we present Jet, a new parallel
algorithm for partition refinement specifically designed for Graphics
Processing Units (GPUs). We combine Jet with GPU-aware coarsening to develop a
-way graph partitioner. The new partitioner achieves superior quality when
compared to state-of-the-art shared memory graph partitioners on a large
collection of test graphs.Comment: Submitted as a non-archival track paper for SIAM ACDA 202
- …